Improving the effectiveness of software prefetching with adaptive executions

نویسندگان

  • Rafael H. Saavedra
  • Daeyeon Park
چکیده

The effectiveness of software prefetching for tolerating latency depends mainly on the ability of programmers and/or compilers to: 1) predict in advance the magnitude of the run-time remote memory latency, and 2) insert prefetches at a distance that minimizes stall time without causing cache pollution. Scalable heterogeneous multiprocessors, such as network of computers (NOWs), present special challenges to static software prefetching because on these systems the network topology and node configuration are not completely determined at compile time. Furthermore, dynamic software prefetching cannot do much better because individual nodes on heterogeneous large NOWs would tend to experience different remote memory delays over time. A fixed prefetch distance, even when computed at run-time, cannot perform well for the whole duration of a software pipeline. Here we present an adaptive scheme for software prefetching that makes it possible for nodes to dynamically change, not only the amount of prefetching, but the prefetch distance as well. Doing this makes it possible to tailor the execution of software pipeline to the previaling conditions affecting each node. We show how simple performance data collected by hardware monitors can allow programs to observe, evaluate and change their prefetching policies. Our results show that on the benchmarks we simulated adaptive prefetching was capable of improving performance over static and dynamic prefetching by 10% to 60%. More important, future increases in the heterogeneity and size of NOWs will increase the advantages of adaptive prefetching over static and dynamic schemes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implicit Acceleration of Critical Sections via Unsuccessful Speculation

The speculative execution of critical sections, whether done using HTM via the transactional lock elision pattern or using a software solution such as STM or a sequence lock, has the potential to improve software performance with minimal programmer effort. The technique improves performance by allowing critical sections to proceed in parallel as long as they do not conflict at run time. In this...

متن کامل

Unobtrusive Reactive Prefetching: A Multicore Approach for Exploiting Hot Streams in Cache Misses

Processor performance continues to outpace memory performance by a large margin. One approach for mitigating this gap is to employ software-based speculative prefetching. Software dynamic prefetchers are able to identify patterns more complex than those of hardware prefetchers while retaining the ability to respond to a programs dynamic behavior; however modern techniques incur prohibitively hi...

متن کامل

Hardware and software cache prefetching techniques for MPEG benchmarks

With the popularity of multimedia acceleration instructions such as MMX, MPEG decompression is increasingly executed on general purpose processors instead of dedicated MPEG hardware. The gap between processor speed and memory access means that a significant amount of time is spent in the memory system. As processors get faster—both in terms of higher clock speeds and increased instruction level...

متن کامل

Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching

Khan, M. 2016. Optimizing Performance in Highly Utilized Multicores with Intelligent Prefetching. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1335. 54 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9450-6. Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefetching, to increase pe...

متن کامل

The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems

Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor and memory. In this paper, we provide a comprehensive summary of current software prefetching and locality optimization techniques, and evaluate the impact of memory trends on the effectiveness of these techniques for three types of applications: regular scientific codes, irregular scie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996